What about data?

The HAM10000 ("Human Against Machine with 10000 training images") dataset which contains 10,015 dermatoscopic images was made publically available by the Harvard database on June 2018 in the hopes to provide training data for automating the process of skin cancer lesion classifications. The motivation behind this act was to provide the public with an abundance and variability of data source for machine learning training purposes such that the results may be compared with that of human experts. If successful, the appplications would bring cost and time saving regimes to hospitals and medical professions alike.

Apart from the 10,015 images, a metadata file with demographic information of each lesion is provided as well. More than 50% of lesions are confirmed through histopathology (histo), the ground truth for the rest of the cases is either follow-up examination (follow_up), expert consensus (consensus), or confirmation by in-vivo confocal microscopy (confocal)

You can download the dataset here: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DBW86T

The 7 classes of skin cancer lesions included in this dataset are:

  1. Melanocytic nevi
  2. Melanoma
  3. Benign keratosis-like lesions
  4. Basal cell carcinoma
  5. Actinic keratoses
  6. Vascular lesions
  7. Dermatofibroma

Let's analyze the metadata of the dataset

As you can see there is imbalance in the number of images per class. There are much more images for the lesion type "Melanocytic Nevi" compared to other types. This is an usual occurence for medical datasets and so it is very important to analyze the data from beforehand.

Let's visualize some examples

Median Frequency Balancing

As we saw above that there is class imbalance in our dataset. To solve that we use this method.

Pre-processing the dataset

Before we load the data we need to alter the dataset structure. When you download the dataset, all the images are together in a folder. To use Pytorch dataloader we need to seggregrate the images into folders of their respetive labels. You can use the following script to automate the process.

Data Augmentation

It is a common fact that medical data is scarce. But to learn a very good model, the network needs a lot of data. So to tackle the problem we perform data augmentation.

First we normalize the images. Data normalization is an important step which ensures that each input parameter (pixel, in this case) has a similar data distribution. This makes convergence faster while training the network. Data normalization is done by subtracting the mean from each pixel and then dividing the result by the standard deviation. The distribution of such data would resemble a Gaussian curve centered at zero. Since, skin lesion images are natural images, we use the normalization values (mean and standard deviation) of Imagenet dataset.

We also perform data augmentation:

The augmentation is applied using the transform.Compose() function of Pytorch. Take note, we only augment the training set. This is because, augmentation is done to aid the training process. So there is no point in augmenting the test set.

Train, Test and Validation Split

We split the entire dataset into 3 parts:

The splitting is done class wise so that we have equal representation of all classes in each subset of the data.

Now we use Pytorch data loader to load the dataset into the memory.

Let us see some of the training images.

Define a Convolutional Neural Network

Pytorch makes it very easy to define a neural network. We have layers like Convolutions, ReLU non-linearity, Maxpooling etc. directly from torch library.

In this tutorial, we use The LeNet architecture introduced by LeCun et al. in their 1998 paper, Gradient-Based Learning Applied to Document Recognition. As the name of the paper suggests, the authors’ implementation of LeNet was used primarily for OCR and character recognition in documents.

The LeNet architecture is straightforward and small, (in terms of memory footprint), making it perfect for teaching the basics of CNNs.

Define a Loss function and Optimizer

Let's use a Classification Cross-Entropy loss.

$H_{y'} (y) := - \sum_{i} y_{i}' \log (y_i)$

The most common and effective Optimizer currently used is Adam: Adaptive Moments. You can look here for more information.

These are some helper functions to evaluate the training process.

Train the network

This is when things start to get interesting. We simply loop over the training data iterator, and feed the inputs to the network and optimize.

Plot the training and validation loss curves.

Test the network on the test data

We have trained the network over the training dataset. But we need to check if the network has learnt anything at all.

We will check this by predicting the class label that the neural network outputs, and checking it against the ground-truth. If the prediction is correct, we add the sample to the list of correct predictions.

Okay, first step. Let us display an image from the test set to get familiar.

Okay, now let us check the performance on the test network:

That looks better than chance, which is about 14% accuracy (randomly picking a class out of 7 classes). Seems like the network learnt something. But maybe it doesn't learn all the classes equally.

Let's check which classes that performed well, and which did not.

Confusion Matrix

Grad cam

Analysis of the results

As we can see from the results of the LeNet model, our system is not capable of processing the complexity of the given input images. Our final accuracy on the test data was 61%. About 39% of the images are missclassified, which is a terrible performance for any clinical use case.

These results could be substantially improved if we opt for a deeper, more complex network architecture than LeNet, which will allow for a richer learning of the corresponding image features.

Switching to superior network architecture:Resnet18

  1. Define a Loss function and Optimizer

Let's use a Classification Cross-Entropy loss.

$H_{y'} (y) := - \sum_{i} y_{i}' \log (y_i)$

The most common and effective Optimizer currently used is Adam: Adaptive Moments. You can look here for more information.

These are some helper functions to evaluate the training process.

Train the network

This is when things start to get interesting. We simply loop over the training data iterator, and feed the inputs to the network and optimize.

Plot the training and validation loss curves.

Test the network on the test data

We have trained the network over the training dataset. But we need to check if the network has learnt anything at all.

We will check this by predicting the class label that the neural network outputs, and checking it against the ground-truth. If the prediction is correct, we add the sample to the list of correct predictions.

Okay, first step. Let us display an image from the test set to get familiar.

Okay, now let us check the performance on the test network:

Confusion Matrix

Grad cam

Conclusion

Training a neural network can be a daunting task, especially for a beginner. Here, are some useful practices to get the best out of your network.

In this tutorial we learned how to train a deep neural network for the challenging task of skin-lesion classification. We experimented with two network architectures and provided insights in the attention of the models. Additionally, we achieved 83% overall accuracy on HAM10000 and provided you with more tips and tricks to tackle overfitting and class imbalance.

Now you have all the tools to not only beat our performance and participate in the exciting MICCAI Challenges, but to also solve many more medical imaging problems.

Happy training!